4 research outputs found

    Prediction of protein subcellular localization based on primary sequence data

    Get PDF
    This paper describes a system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order. Our approach for prediction is to find the most frequent motifs for each protein (class) based on clustering and then to use these most frequent motifs as features for classification. This approach allows a classification independent of the length of the sequence. Another important property of the approach is to provide a means to perform reverse analysis and analysis to extract rules. In addition to these and more importantly, we describe the use of a new encoding scheme for the amino acids that conserves biological function based on point of accepted mutations (PAM) substitution matrix. We present preliminary results of our system on a two class (dichotomy) classifier. However, it can be extended to multiple classes with some modifications. © Springer-Verlag Berlin Heidelberg 2003

    Prediction of protein subcellular localization based on primary sequence data [Birincil Dizi Veri Temelli Protein Hücre İçi Yer Belirleme Tahmini]

    No full text
    Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to find the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features for classification by the help of multi layer perceptrons. This approach allows a classification independent of the length of the sequence. In addition to these, the use of a new encoding scheme is described for the amino acids that conserves biological function based on point of accepted mutations (PAM) substitution matrix. The statistical test results of the system is presented on a four class problem. P2SL achieves slightly higher prediction accuracy than the similar studies. © 2004 IEEE

    Short time series microarray data analysis and biological annotation [Kisa süreli mikrodizi serilerinin analizi ve biyolojik anlamlandirmas?]

    No full text
    Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological data sources are integrated to get information related with the genes in one of those clusters and new sub-clusters are created by using this unified information. As a last step, biological annotation of gene sub-clusters is performed by using information related with those sub-clusters. ©2008 IEEE

    A novel model-based method for feature extraction from protein sequences for classification [Siniflandirma için protein dizilerinin özniteliklerinin çikarilmasinda model tabanli yeni bir yöntem]

    No full text
    Representation of amino-acid sequences constitutes the key point in classification of proteins into functional or structural classes. The representation should contain the biologically meaningful information hidden in the primary sequence of the protein. Conserved or similar subsequences are strong indicators of functional and structural similarity. In this study we present a feature mapping that takes into account the models of the subsequences of protein sequences. An expectation-maximization algorithm along with an HMM mixture model is used to cluster and learn the models of subsequences of a given set of proteins. © 2006 IEEE
    corecore